In this study we analyzed germline and somatic variant calls of JHU Biobank samples to explore the genomic landscape of benign and malignant tumors in people with Neurofibromatosis type 1 (NF1). While the analysis is still being refined, we identified a list of top 100 genes that were found to carry high impact variants in PNF and MPNST samples. We also explored variants in many genes of interest which represent important cellular pathways that have been implicated to be dysfunctional in plexiform neurofibromas (PNFs) and malignant peripheral nerve sheath tumors (MPNSTs). Additionally we explored the variants in Triad samples (i.e. patients with both benign and malignant samples) and identified a list of top 100 genes with high impact somatic variants in PNF and MPNST samples.
Raw fastq data files were quality checked using FastQC v0.11.9 and a
report was generated using MultiQC v1.8. Fastq files were aligned to
GRCh38 using BWA 0.7.17. Duplicates were marked using GATK
MarkDuplicates, and bases recalibrated using GATK BaseRecalibrator and
GATK ApplyBQSR (GATK v4.1.7.0). Germline and de-novo variants were then
called using Google’s DeepVariant software (deepvariant v1.1.0).
DeepVariant calls variants in two steps. In the first step
(make_examples) a human-written heuristic identifies
positions that are potentially variants and creates pileup examples of
them. In the second step (call_variants), a neural network
classifies whether the identified positions are real variants or not and
genotypes them. The identified variants were annotated using Variant
Effect Predictor (VEP v99.2) and converted to MAF files using vcf2maf
(vcf2maf v1.6.21). All of these steps were completed on Nextflow Tower
running the standardized nf-core pipeline sarek v2.7.1.
After the initial alignment and base recalibration, somatic variants were called using GATK Mutect2 software (GATK v4.1.7.0). The variants were annotated using Variant Effect Predictor (VEP v99.2) and converted to MAF files using vcf2maf (vcf2maf v1.6.21). All of these steps were completed on Nextflow Tower running the standardized nf-core pipeline sarek v2.7.1.
The samples analyzed here include samples sequenced in batch 1, batch 2, and batch 3 of JHU Biobank. Batch 1 was fully sequenced in JHU sequencing core. Batch 2 and Batch 3 samples were sequenced in WUSTL.
Variant calls for germline and de-novo mutations using DeepVariant were visualized in PNF and MPNST samples without any prefilter steps. The samples include tumor samples from 3 different sequencing batches.
All variants regardless of whether the variant caller denoted them as
high confidence calls or low confidence calls have been included in
unfiltered_variants. It should be noted that these include
potential false positives. This was done to explore the full breadth of
all called variants before additional filters are imposed.
Figure 1 is a summary overview of all unfiltered variants identified in the samples.
Figure 1
Figure 2 below shows oncoplots of all unfiltered variants identified in our list of genes of interest.
Figure 2
The germline and de-novo variant calls were then filtered to exclude common variants and variants with potentially low or medium deleterious consequences.
All variants that had values of “RefCall”, “common_variant”, or
“RefCall;common_variant” in the FILTER column were
excluded. This excludes any variants that are deemed common_variant due
to gnomAD_AF >= 0.0005 or are low confidence variant calls. A RefCall
entry occurs only in Deepvariant output files when a candidate variant
is proposed and then is specifically rejected as non-variant.
Additionally all variants that had values of “MODERATE” or
“MODIFIER” in the IMPACT column were excluded.
Only variants that had “PASS” in FILTER column and
“HIGH” in IMPACT column were included in the analyses
below.
Figure 3 below shows filtered variants in our list of genes of interest in PNF and MPNST samples.
Figure 3
Now we specifically choose the patients who provided samples for normal, benign, and malignant tissue. These set of samples are called “TRIADS”. The patients with triad samples are: “JH-2-002”, “JH-2-015”, “JH-2-016”, “JH-2-023”, “JH-2-031”, “JH-2-045”, “JH-2-055”, “JH-2-084”.
We have 5 PNF samples and 11 MPNST samples from TRIAD patients. One thing to note is that there are more number of MPNST samples than PNF samples. This is mainly because 1) Some patients with MPNST had multiple samples sequenced, and all of them are represented in the plots, 2) Some patients with MPNST had a benign form of NF1 tumor other than PNF (e.g. cNF or ANF).
Figure 4 below shows filtered germline and de-novo variants in top 100 genes in the triad samples.
Figure 4
Figure 5 below shows filtered variants in the triads in our list of genes of interest.
Figure 5
Somatic calls from Mutect2 were visualized in the PNF and MPNST samples without any prefilter steps. The samples include tumor samples from 3 different sequencing batches.
All variants regardless of whether the variant caller denoted them as
high confidence calls or low confidence calls have been included in
unfiltered_variants. It should be noted that these include
potential false positives.
Figure 6 is a summary overview of all unfiltered variants identified in the samples :
Figure 6
Figure 7 shows an oncoplot of all unfiltered variants identified in our list of genes of interest in the pNF and MPNST samples.
Figure 7
The somatic variant calls were then filtered to exclude common variants and variants with potentially low or medium impact consequences.
All variants that had values of “common_variant” in the
FILTER column were excluded. This excludes any variants
that are deemed common_variant due to gnomAD_AF >= 0.0005 or are low
confidence variant calls.
Additionally all variants that had values of “MODERATE” or
“MODIFIER” in the IMPACT column were excluded.
Only variants that had “.” in FILTER column and
“HIGH” in IMPACT column were included in the analyses
below.
Figure 8 shows the top genes with somatic variants in PNF and MPNST samples after filtering out any common variants or variants that have low or medium impact.
Figure 8
Figure 9 shows filtered somatic variants in PNF and MPNST samples in our list of genes of interest. These results show the following:
Not all PNF or MPNST samples show the presence of single nucleotide variants in the NF1 gene. There may be two reasons for this : a) Samples may contain microdeletions or copy number variations in NF1 gene which would not be detected in this analysis, b) Samples may have lower tumor purity resulting in low detection range for NF1 variants.
30% of the MPNST samples show variants in SUZ12 gene, a known gene affected in MPNST samples.
Figure 9
Now we specifically choose the patients who provided samples for normal, benign, and malignant tissue. These set of samples are called “TRIADS”. The patients with triad samples are: “JH-2-002”, “JH-2-015”, “JH-2-016”, “JH-2-023”, “JH-2-031”, “JH-2-045”, “JH-2-055”, “JH-2-084”.
Figure 10 shows top 100 genes with filtered somatic variants in PNF and MPNST samples. We note that NF1 is among the top 100 genes with impactful somatic variants.
Figure 10
Figure 11 shows filtered variants in our list of genes of interest in the triad samples.
Like before we note that many of the samples do not show any variants in NF1 gene. The Biobank is currently looking at tumor purity information for these samples to rule out any purity related issues.
Figure 11